CUBE CONNECT Edition Help

Using MPICORES and NCORES together

It is possible to use both the NCORES and MPICORES parameters together to allocate CPU core resources for a particular problem. In this scenario, each MPICORE used will start a process to estimate user classes in parallel, while the NCORES parameters will allow each process to launch multiple threads to aid with the parallel computations required for each class run. When used in tandem, these two parameters can provide parallel processing power that exceeds the limits provided by using either parameter alone. As seen by the examination above of the NCORES parameter, individual class estimation problems will produce diminishing returns in parallel execution speed as the number of cores is increased because of the inherent constraint of sequential code sections present in the program. On the other hand, the MPICORES parameter essentially provides parallel runs of individual problems and so sequential portions of code are being performed in parallel regardless. The downside to this is that we have seen that we can only achieve an execution time as fast as the longest running user class, and so are left with idle cores once the other classes finish. By allocating the available CPU cores using a combination of NCORES and MPICORES for a given problem, we are able to achieve a 'balance of forces' so to speak and maximize parallel efficiency. To study this idea further, we will again refer to the two MPICORES case studies provided above and see what performance effect is seen by the additional usage of the NCORES parameter.

Case 1: Medium Size Static Problem — Table 5 shows timing results for our first test case using a combination of cores allocated via the MPICORES and NCORES parameters. While increasing both the MPICORES and NCORES parameters on their own helped provide a runtime reduction for this problem, it was weak in both cases. By using a combination of the two we find much better performance than with either on their own.

Case 2: Medium Size Dynamic Problem — Table 6 shows timing results for the second test case using a combination of cores allocated via the MPICORES and NCORES parameters. This case provides a good example of how using the MPICORES and NCORES parameters in unison can provide excellent parallel speedup. The good balance of the user classes makes the use of the MPICORES parameter very efficient with two cores, but without being able to increase this parameter further the addition of cores from the NCORES parameter helps provide an extra boost.

Case 3: Large Size Static Problem — Table 7 shows timing results for the third test case using a combination of cores allocated via the MPICORES and NCORES parameters. While the NCORES parameter is clearly dominant over the MPICORES parameter for this problem due to large amount of time spent in the optimization loop, we still find that using 2-MPICORES and 4-NCORES edges out the 8-NCORES case. Had we been running on a larger machine with more available cores we would find that this runtime gap would widen with increasing numbers of cores as the diminishing returns involved with increasing the NCORES parameter took effect.

Table 5: A comparison table for using a combination of MPICORES and NCORES on a system with 8 available cores using the case study problem 4 which uses 3 user classes. Note that since the problem has only 3 user classes, the program automatically sets the MPIOCRES variable to the highest useful value of 3, and so the total number of cores in use is actually only 6 when NCORES=2. The top number of each entry indicates the total run time in seconds and the bottom number represents the speedup multiple over a single core.

Table 6: A comparison table for using a combination of MPICORES and NCORES on a system with 8 available cores using the case study problem 5 which uses 2 user classes. The top number of each entry indicates the total run time in seconds and the bottom number represents the speedup multiple over a single core.

Table 7: A comparison table for using a combination of MPICORES and NCORES on a system with 8 available cores using the case study problem 6 which uses 9 user classes. The top number of each entry indicates the total run time in seconds and the bottom number represents the speedup multiple over a single core.

From these cases we find that a generalization can be made, and that is that the best reductions in runtime will occur from using a combination of MPICORES and NCORES when the problem being run contains more than one user class.